Knowledge Compilation for Lifted Probabilistic Inference: Compiling to a Low-Level Language

نویسندگان

  • Seyed Mehran Kazemi
  • David Poole
چکیده

Algorithms based on first-order knowledge compilation are currently the state-of-the-art for lifted inference. These algorithms typically compile a probabilistic relational model into an intermediate data structure and use it to answer many inference queries. In this paper, we propose compiling a probabilistic relational model directly into a low-level target (e.g., C or C++) program instead of an intermediate data structure and taking advantage of advances in program compilation. Our experiments represent orders of magnitude speedup compared to existing approaches. Probabilistic relational models (PRMs) (Getoor and Taskar 2007) are forms of graphical models where there are probabilistic dependencies among relations of individuals. The problem of lifted inference for PRMs was first explicitly proposed by Poole (2003) who formulated the problem as first-order variable elimination (FOVE). Current representations for FOVE are not closed under all inference operations. Search-based algorithms (Gogate and Domingos 2011; Poole, Bacchus, and Kisynski 2011) were proposed as alternatives for FOVE algorithms and gained more popularity. Van den Broeck (2013) follows a knowledge compilation approach to lifted inference by evaluating a search-based lifted inference algorithm symbolically (instead of numerically) and extracting a data structure on which many inference queries can be efficiently answered. Lifted inference by knowledge compilation offers huge speedups because the compilation is done only once and then many queries can be answered efficiently by reusing the compiled model. In order to answer new queries in the other algorithms, however, even though they can reuse a cache, they must reason with the original model and repeat many of the operations. In this paper, we propose compiling relational models into low-level (e.g., C or C++) programs by symbolically evaluating a search-based lifted inference algorithm and extracting a program instead of a data structure, and taking advantage of advances in program compilation. Our work is inspired by the work of Huang and Darwiche (2007) who turn exhaustive search algorithms into knowledge compilers by recording the trace of the search. Their traces can be seen Copyright c 2016, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. as straight-line programs generated by searching the entire space. In contrast, our programs have loops over the values of the variables and result in more compact representations. Our generated programs have similar semantics to the FONNFs of Van den Broeck (2013). The inference engine for FO-NNFs can be viewed as an interpreter that executes a FO-NNF node-by-node, but we can compile our programs and perform inference faster. We focus on inference for Markov logic networks (MLNs) (Richardson and Domingos 2006), but our ideas can be used for other relational representations. We compile an MLN into a low-level program by symbolic evaluation of lifted recursive conditioning algorithm (Gogate and Domingos 2011; Poole, Bacchus, and Kisynski 2011) because of its simplicity to describe and implement. Our idea, however, can be used for any search-based lifted inference algorithm. Notation and Background A population is a set of individuals and corresponds to a domain in logic. A logical variable (LV) is written in lower case and is typed with a population; we let Dx represent the population associated with x and |Dx| represent the size of the population. A lower case letter in bold represents a tuple of LVs. Constants, denoting individuals, are written starting with an upper-case letter. A parametrized random variable (PRV) is of the form R(t1, . . . , tk) where R is a k-ary predicate symbol and each ti is an LV or a constant. A grounding of a PRV can be achieved by replacing each of the LVs with one of the individuals in their domains. A literal is an assignment of a PRV to True or False. We represent R(. . .) = True by r(. . .) and R(. . .) = False by ¬r(. . .). A world is an assignment of truth values to each grounding of each PRV. A formula is made up of literals connected with conjunctions and/or disjunctions. A weighted formula (WF) is a triple hL,F,wi, where L is a set of LVs with |L| = ’x2L |Dx|, F is a formula whose LVs are a subset of LVs in L, and w is a real-valued weight. h{x,y,z},g(x,y)^¬h(y),1.2i is an example of a WF. For a given WF hL,F,wi and a world w , we let h(L,F,w) represent the number of assignments of individuals to the LVs in L for which F holds in w . A Markov logic network (MLN) is a set of WFs and induces the following probability distribution: Prob(w) = 1 Z ’ hL,F,wi exp(h(L,F,w)⇤w) (1) where w is a world, the product is over all WFs in the MLN, and Z =Âw 0(’hL,F,wi(exp(h(L,WF,w 0)⇤w)) is the partition (normalization) function. It is common to assume formulae in WFs of MLNs are in conjunctive or disjunctive form. We assume they are in conjunctive form. We assume input MLNs are shattered (de Salvo Braz, Amir, and Roth 2005) based on observations. An MLN can be conditioned on random variables by updating the WFs based on the observed values for the variables. It can be also conditioned on some counts (the number of times a PRV with one LV is True or False) as in the following example: Example 1. Suppose for an MLN containing PRV R(x) we observe that R(x) = True for exactly 2 out of 5 individuals. We create two new LVs x1 and x2 representing the subsets of x having R True and False respectively, with |Dx1 | = 2 and |Dx2 | = 3. We update PRVs based on the new LVs and replace r(x1) with True and r(x2) with False. Lifted Recursive Conditioning Algorithm 1 gives a high-level description of a search-based lifted inference algorithm obtained by combining the ideas in (Poole, Bacchus, and Kisynski 2011) and (Gogate and Domingos 2011). Following (Poole, Bacchus, and Kisynski 2011), we call the algorithm lifted recursive conditioning (LRC). The cache is initialized with h{},1i. Algorithm 1 LRC(MLN M) Input: A shattered MLN M. Output: Z(M). if hM,Vali 2Cache then return Val if 9WF = hL,F,wi 2 M s.t. F ⌘ True then return exp(|L|⇤w)⇤LRC(M \WF) if 9WF = hL,F,wi 2 M s.t. F ⌘ False then return 2nterv(M,WF) ⇤LRC(M \WF) if |CC = connected components(M)|> 1 then return ’cc2CC LRC(cc) if 9x s.t. decomposer(M,x) then return LRC(decompose(M,x))#GCC(M,x) Select P(x) from the branching order if P has no LVs then sum = Âv2{True,False} LRC(M | P = v) if P has one LV x then sum = Âx i=0 |Dx| i ⇤LRC(M | P = True exactly i times) Cache =Cache[hM,sumi return sum For an input MLN M, LRC first checks a few possibilities that can potentially save computations. If Z(M) has been computed before, LRC returns the value from the cache. If the formula of a WF is equivalent to True, LRC evaluates the WF (exp(|L| ⇤w)) and removes it from the set of WFs. If the formula of a WF is equivalent to False, LRC removes the WF. However, if there are random variables in this WF that do not appear in any other WFs, LRC multiplies the result by 2nterv(M,WF) where nterv(M,WF) calculates the number of totally eliminated random variables from the ground MLN after removing the WF, to account for the possible number of value assignments to these variables. If the input MLN M can be divided into more than one connected components, Z(M) is the product of the Z of these components. If the network consists of one connected component but the grounding is disconnected1, the connected components are the same in the grounding up to renaming the constants from one or more LVs x. x is called the decomposer of the network. In this case, LRC replaces LVs in x by an assignment of individuals (aka decomposing the network on x), calculates Z of the new model, and raises it to the power of #GCC(M,x): number of connected components in the grounding of M with x as the decomposer. If none of the above cases hold, LRC proceeds by a case analysis on one of the PRVs. If the PRV P selected for case analysis has no LVs, Z(M) = Z(M | P = True)+Z(M | P = False). These two Zs can be computed recursively as MLNs are closed under conditioning (6th if). If P has one LV x, the case analysis should sum over 2|Dx| cases: one for each assignment of values to the |Dx| random variables. However, the individuals are exchangeable, i.e. we only care about the number of times P(x) is True, not about the individuals that make it True. Thus, we only sum over |Dx|+1 cases with the ith case being the case where for exactly i out of |Dx| individuals P(x) is True. We also multiply the ith case by |Dx| i to take into account the number of different assignments to the individuals in Dx for which P(x) is exactly i times True. The sum can be computed with |Dx|+1 recursive calls (7th if). In this paper, we assume the input MLN is recursively unary: Definition 1. An order for the PRVs of an MLN is a recursively unary order if for performing case analysis on the PRVs with that order while recognizing disconnectedness and decomposability, no population needs to be grounded. Definition 2. An MLN is recursively unary if there exists at least one recursively unary order for its PRVs. Other MLNs can be partially grounded and turned into a recursively unary MLN offline. There exist heuristics that guarantee generating recursively unary orders for recursively unary networks. Using such heuristics, we do not need to consider in Algorithm 1 the case where the PRV selected for case analysis has two or more LVs. Compiling to a Target Program Algorithm 1 finds the Z of the input MLN. Inspired by the knowledge compilation approach of Van den Broeck (2013) and its advantages, we develop Algorithm 2 which evaluates Algorithm 1 symbolically and extracts a low-level program instead of finding Z. We chose C++ as the low-level program because of its efficiency, availability, and available compiling/optimizing packages. 1See (Poole, Bacchus, and Kisynski 2011) for detailed analysis. In Algorithm 2, V NG() (variable name generator) and ING() (iterator name generator) return unique names to be used for C++ variables and loop iterators respectively. Each time we call LRC2CPP(M,vname), it returns a C++ code that stores Z(M) in a variable named vname. Example 2. Consider compiling the MLN M1 with a WF h{x,s}, f (x)^g(x,s)^h(s),1.2i with |Dx| = 5 and |Ds| = 8 to a C++ program by following Algorithm 2. Initially, we call LRC2CPP(M1,"v1"). Suppose the algorithm chooses F(x) for a case analysis. Then it generates: v1 = 0; f or(int i = 0; i <= 5; i++){ Code f or LRC2CPP(M2,"v2") v1 += C(5, i)⇤ v2; } where M2 ⌘ M1 | F(x) = True exactly i times. In M2, s is a decomposer and #GCC(M2,s) = |Ds| = 8. Therefore, LRC2CPP(M2,"v2") generates: Code f or LRC2CPP(M3,"v3") v2 = pow(v3,8); where M3 ⌘ decompose(M2,s). In M3, s is replaced by an individual, say, S. LRC2CPP(M3,"v3") may proceed by a case analysis on H(S) and generate: Code f or LRC2CPP(M4,"v4") Code f or LRC2CPP(M5,"v5") v3 = v4+ v5; where M4 ⌘M3 |H(S) = True and M5 ⌘M3 |H(S) =False. We can follow the algorithm further and generate the program. Caching as Needed In the compilation stage, we keep record of the cache entries that are used in future and remove from the C++ program all other cache inserts. Pruning Since our target program is C++, we can take advantage of the available packages developed for pruning/optimizing C++ programs. In particular, we use the O3 flag when compiling our programs which optimizes the code at compile time. Using O3 slightly increases the compile time, but substantially reduces the run time when the program and the population sizes are large. We show the effect of pruning our programs in the experiments. MinNestedLoops Heuristic The maximum number of nested loops (MNNL) in the C++ program is a good indicator of the time complexity and a suitable criteria to minimize using a suitable elimination ordering heuristic. To do so, we start with the order taken from the MinTableSize (MTS) heuristic (Kazemi and Poole 2014), count the MNNL in the C++ program generated when using this order, and then perform k steps of stochastic local search on the order to minimize the MNNL. We call this heuristic MinNestedLoops (MNL). Note that the search is performed only once in the compilation phase. The value of k can be set according to how large the network is, how large the population sizes are, how much time we want to spend on the compilation, etc. Proposition 1. MinNestedLoops heuristic generates recursively unary orders for recursively unary networks. Algorithm 2 LRC2CPP(MLN M, String vname) Input: A shattered MLN M and a variable name. Output: A C++ program storing Z(M) in vname. if M 2Cache then return "{vname}= cache.at({M.id});" if 9WF = hL,F,wi 2 M s.t. F ⌘ True then nname = VNG() return LRC2CPP(M\WF, nname) + "{vname} = {nname}⇤ exp(w⇤ |L|);" if 9WF = hL,F,wi 2 M s.t. F ⌘ False then nname = VNG() return LRC2CPP(M\WF, nname) + "{vname} = {nname}⇤ pow(2,{nterv(M,WF)});" if |CC = connected components(M)|> 1 then retVal = "" for each cc 2 CC do nname[cc] = VNG() retVal += LRC2CPP(cc, nname[cc]) return retVal + "{vname}= {nname. join("⇤")};" if 9x s.t. decomposer(M,x) then nname = VNG() return LRC(decompose(M,x),nname) + "{vname}= pow({nname},{#GCC(M,x)});" Select P(x) from the branching order if P has no LVs then nname1 = VNG(), nname2 = VNG() retVal = LRC2CPP(M | P = True,nname1) retVal += LRC2CPP(M | P = False,nname2) retVal += "{vname}= {nname1}+{nname2};" if P has one LV x then retVal = "{vname}= 0;" i = ING(), nname = VNG() retVal += "for(int {i}=0; {i} <= {|Dx|}; {i}++){" retVal += LRC2CPP(M | P=True exactly i times, nname) retVal += "{vname} += C({|Dx|},{i}) * {nname};" retVal += "}" retVal += "cache.insert({M.id},{vname});" Cache =Cache[M return retVal Experiments and Results We implemented Algorithm 2 in Ruby and did our experiments using Ruby 2.1.5 on a 2.8GH core with 4GB RAM under MacOSX. We used the g++ compiler with O3 flag to compile and optimize the C++ programs. We compared our results with weighted first-order model counting (WFOMC)2 and probabilistic theorem proving (PTP)3, the state-of-the-art lifted inference algorithms. We allowed at most 1000 seconds for each algorithm. All queries were ran multiple times and the average was reported. When using MNL as our heuristic, we did 25 iterations of local search. Fig. 1 represents the overall running times of LRC2CPP 2Available at: https://dtai.cs.kuleuven.be/software/wfomc 3Available in Alchemy-2 (Kok et al. 2005)

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Lifted Inference and Learning in Statistical Relational Models

Statistical relationalmodels combine aspects of first-order logic andprobabilistic graphical models, enabling them to model complex logical and probabilistic interactions between large numbers of objects. This level of expressivity comes at the cost of increased complexity of inference, motivating a new line of research in lifted probabilistic inference. By exploiting symmetries of the relation...

متن کامل

Lifted Probabilistic Inference by First-Order Knowledge Compilation

Probabilistic logical languages provide powerful formalisms for knowledge representation and learning. Yet performing inference in these languages is extremely costly, especially if it is done at the propositional level. Lifted inference algorithms, which avoid repeated computation by treating indistinguishable groups of objects as one, help mitigate this cost. Seeking inspiration from logical ...

متن کامل

On the Completeness of First-Order Knowledge Compilation for Lifted Probabilistic Inference

Probabilistic logics are receiving a lot of attention today because of their expressive power for knowledge representation and learning. However, this expressivity is detrimental to the tractability of inference, when done at the propositional level. To solve this problem, various lifted inference algorithms have been proposed that reason at the first-order level, about groups of objects as a w...

متن کامل

Conditioning in First-Order Knowledge Compilation and Lifted Probabilistic Inference

Knowledge compilation is a powerful technique for compactly representing and efficiently reasoning about logical knowledge bases. It has been successfully applied to numerous problems in artificial intelligence, such as probabilistic inference and conformant planning. Conditioning, which updates a knowledge base with observed truth values for some propositions, is one of the fundamental operati...

متن کامل

Compiling Probabilistic Logic Programs into Sentential Decision Diagrams

Knowledge compilation algorithms transform a probabilistic logic program into a circuit representation that permits efficient probability computation. Knowledge compilation underlies algorithms for exact probabilistic inference and parameter learning in several languages, including ProbLog, PRISM, and LPADs. Developing such algorithms involves a choice, of which circuit language to target, and ...

متن کامل

10 Years of Probabilistic Querying - What Next?

Over the past decade, the two research areas of probabilistic databases and probabilistic programming have intensively studied the problem of making structured probabilistic inference scalable, but—so far—both areas developed almost independently of one another. While probabilistic databases have focused on describing tractable query classes based on the structure of query plans and data lineag...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016